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Abstract 

With the help of metadynamics it is possible to calculate efficiently the free energy of systems 
displaying high energy barriers as a function of few selected "collective variables". In doing this, 
the contribution of all the other degrees of freedom ("microscopic" variables) is averaged out and, 
thus, lost. In the following, it is shown that it is possible to calculate the thermal average of these 
microscopic degrees of freedom during the metadynamics, not loosing this piece of information. 
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Metadynamics is an algorithm developed by Laio and Parrinello [l|, |2J to calculate the 
free energy of complex systems as a function of some slow-varying collective variables (CV). 
It consists in simulating the dynamics by adding to its energy function, at regular intervals, 
a non-Markovian term which depends on the collective variables and which disfavours the 
sampling of regions already visited. This algorithm is particularly efficient in calculating 
the free energy of systems displaying large energy barriers, since the non-Markovian term 
fills effectively the free energy wells, allowing the system to move fast to other regions of the 
conformational space. 

In order to make the algorithm efficient, CV must be chosen in such a way that the motion 
of the system in the normal directions is fast and does not encounter major energy barriers. 
Once one obtains the free energy as a function of the CV, it is possible to calculate thermal 
averages for any function of the CV, just performing a weighted summation. The problem 
is that all information concerning variables other than CV, accounting for the microscopic 
motion of the system, is lost. Of course, one could first perform a metadynamic run, obtain 
the free energy as a function of the CV, and then perform a standard dynamics with a 
potential which is the sum of the system potential and the non-Markovian term. Recording 
the values of the microscopic variable of interest during this second run, and rescaling it 
to subtract the effect of the non-Markovian potential, will give its thermal average. The 
drawbacks of this method is not only that one has to perform two calculations, loosing 
all information on the microscopic variable collected in the former, but also that the latter, 
being a random walk in a flat energy surface, cannot distinguish between thermo dynamically 
important and non-important regions, and consequently must sample exhaustively all parts 
of conformational space in order to provide the correct averages. 

The idea is that, during the sampling between successive updates of the non-Markovian 
potential, the system can equilibrate small regions of the conformational space. The equili- 
brated regions usually correspond to local minima of the time-varying free energy. They can 
be very small, but their number very large and change during the simulation, thus covering 
the whole conformational space and, in particular, regions corresponding to the minima of 
the true free energy where the non-Markovian potential accumulates. Collecting together 
the averages of microscopic variables in these small regions and weighting them properly 
provides the correct value of the associated thermal averages. 

Let r be the conformational space and 7 € T be a conformation of the system, 2(7) a 
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collective variable and 1/(7) a fast-varying quantity whose average one wishes to know, V v (x) 
the non-Markovian potential after v updates, Atp the time interval between two updates 
and £7(7) the potential energy of the system. Assume that in the time interval from vAtp 
to {y + l)AtD which follows the z/th update, the system has sampled exhaustively a region, 
however small, A v C T. This is the same hypothesis needed by metadynamics to work. If we 
indicate with square parentheses the average of any quantity over the region A v calculated 
by the metadynamics algorithm for the evolution under the total potential U + V u , then the 
ergodic theorem assures that 

ivU = yH vb)e-W*W*™\ (1) 
v ieA v 

where Z v is the partition function restricted to the visited region. One can thus calculate 
the thermodynamic average of y restricted to the visited region as 

E vb)*-™ = [y^] A • ( 2 ) 

The quantity Z v can also be calculated applying Eq. ([I]) to a constant function, obtaining 

±-n(A v ) = [e?V+W] Av , (3) 

where Q(A U ) is the volume of the region A v . From Eq. (T5]) one wishes to reconstruct the 
actual thermodynamic average of y, collecting together the partial averages in each region 
A v visited. In order to weight correctly the contribution coming from multiple visits of the 
same region of conformational space, let's partition it into regions Aj. Each A v can thus 
be seen as a collection of a number of regions Aj. The contribution of the region Aj to the 
thermodynamic average of y is 

where n s (Aj) is the number of times the system has visited region Aj at different v and Z v 
is calculated from Eq. ([3]). The sum labelled by v : Aj C A u is meant as over all v such 
that Aj belongs to A v . The average of y is then the sum of the contribution of all regions 
Aj, that is 

7 i SK % > u:AiCA v 
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The partition function is found setting y equal to a constant, which gives 

*=Es^j E M^'k- («) 

i v 7 v.AiCAu 

What remains to be done is to give an operative definition of the regions Aj. The easiest 
choice is to use for this purpose a bin of the collective variable x, and thus the sum over % 
can be substituted by the sum over x. 

An implementation of this algorithm is thus: 

1. At each step after the i/th deposition, record x, y, U and V v . 

2. Before the (v + l)th deposition, calculate the histogram of visited x. Define a thresh- 
old on frequencies of the histogram in order to neglect the poorly visited bins. Iden- 
tify the connected region of x whose histogram lies above the threshold and con- 
tains the maximum, and define this set as A v . Calculate [exp(/3V^)] ; [y exp((3V u )} 
and [exp(— (3{U + V v ))\ over A v . Calculate Z v from Eq. (Ej) assuming that Q.(A U ) is 
proportional to the range of x spanned. 

3. At the end of the simulation, calculate n s (Aj) from the recorded A v . Then calculate 
the total partition function from Eq. (jSJ) and the average < y > from Eq. (jSJ). 

We have tested the above idea on a simple, two-dimensional system controlled by the 
energy function 

U(x,y) = -2exp[-(x + 2) 2 /2]-exp[-(x-2) 2 /2]+x 2 /20 + xy/10 + exp(?/ 5 )+exp(-?/ 5 ), (7) 

which displays two wells, separated by a barrier along the ^-direction (see Fig. [T]) whose 
height is of the order of unity, in the arbitrary energy units defined by Eq. (CO). Within 
each well, the system does not display any barrier, complying with the requirement that y 
is fast-varying. The system is constrained to the region — 10 < x < 10, — 1 < y < 1. 

Performing metadynamic calculations employing x as collective variable and lasting for 
10 7 steps (with deposition time tjj = 50000 steps (i.e., very long, the dependence of the 
equilibration ability on tp will be studied in the following), height of the Gaussian functions 
equal to 0.005, standard deviation 0.01, binning of x 0.01, threshold on the histogram 
10~ 3 , diffusion coefficient 2.5 ■ 10~ 3 step -1 ), one obtains the thermal average < y > of the 
"microscopic" quantity y. The value of < y > obtained from this calculation is displayed in 



Fig. [I] as a function of the temperature, together with the actual value of < y > calculated 
as Z~ l J dxdy y exp(—U(x,y)/T), where Z is the normalization factor. The agreement is 
good at low temperatures and worsen at high temperatures. This is not unexpected, since 
metadynamics is designed to overcome large energy barriers (i.e., 3> T), not to speed up the 
sampling of flat free-energy surfaces. 

The value of to used above is very large, in order to check the correctness of the algorithm 
in the conditions under which Eq. ([1]) certainly holds. To be computationally efficient, to 
has to be as small as possible |2j . In the upper panel of Fig. [2] is displayed the value of 
< y > at T = 0.1 and T = 0.01, calculated from simulations performed with different update 
times to- At both temperatures the average of y reaches its true value for large tjj. For 
low to, < y > is largely overstimated, as a result of the fact that the system is not able to 
equilibrate regions along the y-direction, and thus the algorithm fails. The threshold of to 
needed for such an equilibration seems to be strongly dependent on the temperature. While 
at T = 0.01 a deposition time of 100 is enough to obtain a < y > with a 5% error, at T = 0.1 
one needs a to of the order of 10 4 . 

In the lower panel of Fig. [2] is displayed the time needed by < y > and by the free 
energy F(x) to reach their true values at T = 0.1. While < y > is correctly com- 
puted in 10 6 steps, the free energy converges to the equilibrium one, given by F(x) = 
— TlogJ dy exp(—U(x,y)/T) in 10 7 steps, indicating that the thermal average of the mi- 
croscopic variable is correclty calculated during the sampling, and not after that the meta- 
dynamics has flattened the free energy surface. The fast convergence of < y > with respect 
to F(x) is associated with the fact that the value of < y > is determined mainly by small, 
low free energy regions of conformational space, while a indicates the convergence of the 
whole F(x). 

Summing up, the values assumed by microscopic variables during a metadynamic run can 
be collected and weighted properly in order to obtain their thermal average at no further 
cost. The hypotheses under which the algorithm works are that V u converges to the true 
free energy of the system, that it is possible to label regions where the system equilibrates 
(i.e., to define non-empty A u ) and that it is possible to evaluate the volume fl(A u ) of such 
regions. 

In order to test the effects of roughness in the energy landscape along the y direction, we 



5 



have added to the potential of Eq. ([7]) a term 

where h is an energy parameter which we can be tuned. In Fig. [3] it is shown the behaviour 
of the predicted < y > with respect to h. At the lower temperature T = 0.01 and using 
deposition times tn — 200 or to = 2000, the correct value of the average is obtained up to 
h = 0.2. Above this value, the system is no longer able to diffuse along the y directions and 
thus the resulting average is incorrect. At T = 0.1, using to = 10 4 , the algorithm is able 
to calculate a reasonable value of < y > up to h = 1, although the associated error ranges 
from 5% to 20% as h is increased from 0.1 to 1. 

The case discussed above is a simple example where the averages can be calculated 
analyticaly, meant to illustrate the algorithm. A further test has been performed on a more 
realistic system, that is dialanine, which has been widely characterized 3] in terms of the 
Ramachandran dihedrals and if). In particular, it has been shown to display at 300K a 
barrier of several kT along the direction, while it is smoother along the ip direction. The 
thermal average < if) > calculated through a metadynamic run using both and if) as CV 
gives 76°. Performing a metadynamics using as CV only (f> (with updating time 0.6 ps, height 
of the Gaussians equal to 0.1 kcal/mol) gives the result reported in Fig. HI indicating that 
a good estimate of can be found after approximately 1500 updates of the non-Markovian 
term. 

We have thus shown that from a metadynamics run it is possible to extract the thermal 
averages of micorscopic quantities, under the same hypotheses which allow metadynamics 
to work. 
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FIG. 1: (left) The potential energy of the system as described by Eq0 Each level of the contour plot 
corresponds to 0.2 energy units, (right) The thermal average < y > as a function of temperature 
obtained from metadynamic calculations (solid curve) and integrated numerically (dashed curve). 
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FIG. 2: (upper panel) The thermal average < y > at T = 0.1 (lower solid curve) and T = 0.01 
(upper solid curve) as a function of the deposition time tp- The dashed line indicate their respective 
actual value, (lower panel) The thermal average < y > (solid curve) and the standard error a of 
the calculated free energy with respect to the actual one, as functions of the simulation length t at 



o 



0.2 0.4 h 0.6 0.8 1 




FIG. 3: The values of < y > calculated placing barriers of different heights h along the y-coordinate. 
The dashed curve indicates the correct value, (above) The vaule of < y > obtained at T = 0.01 
using to = 200 (solid curve) and tr> = 2000 (dotted curve), (below) The same at T = 0.1 and 
to = 10 4 . The simulations lasted for 10 4 iz). 
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FIG. 4: The values of < tp > of dialanine, calculated with the above algorithm, as a function of 
the number of updates of the non-Markovian term (each of 0.6 ps). The dashed line is the true 
value. 
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